Palantir Technologies

VAST 2009 Challenge
Grand Challenge

Authors and Affiliations

Palantir Technologies – VAST09 Team
Brandon Wright, Palantir Technologies, bwright@palantirtech.com
Jason Payne, Palantir Technologies
Matt Steckman, Palantir Technologies

Tools

Overview: Palantir is a platform for collaborative, all-source analysis and operations, enabling geospatial, social-network, temporal, statistical, and structured and unstructured analysis. Palantir provides flexible tools to import and model data, intuitive constructs to search against this data, and powerful techniques to iteratively define and test hypotheses. Our platform is most highly valued for:

Background: Palantir is operational today at many of the most prestigious intelligence, defense, law enforcement, and regulation/oversight organizations in the world. Palantir was put together by the founders of PayPal, capitalizing on the lessons learned by their anti-fraud department. Facing highly coordinated cyber attacks in order to commit payment fraud and exploit sensitive consumer information, an entirely new approach was required. Existing technology was poorly suited to dealing with sparse, cyber-specific data. To defeat the international fraud rings, high level conceptual access to the data was required. The analyst-driven intelligence analysis tools that eventually became the Palantir platform were a direct outgrowth of this effort.

Company Web site:
http://www.palantirtech.com

Check out our Analysis Blog to see more analysis using Palantir: http://www.palantirtech.com/government/analysis-blog.

Video

Palantir_GC_Video.wmv

Answers

GC.1: Please describe the scenario supported by your analysis of the three mini-challenges in a Debrief.

Key Judgments

Espionage at the Embassy in Flovania

Malicious Insider Leaks Data

Our investigation into the data leak at the Embassy in Flovania has resulted in a reasonable suspicion that an Embassy employee, probably employee 30, has surreptitiously accessed 12 fellow employees’ computers on 18 separate occasions. These incidents occurred every Tuesday and Thursday, increasing from 1 to 3 times per day, over a four week period beginning on January 8, 2008. All 18 of these unauthorized data transmissions sent payload requests averaging over 8 million bytes in size to IP address 100.59.151.133 over port 8080. Additionally, these transmissions sent the largest payload requests of all network traffic using port 8080.

Based on an investigation combining Embassy IP logs and prox card data, we have narrowed our suspicion to employee 30. Employee 30 was at his desk when employee 31’s computer was compromised. Since employee 30 shares an office with employee 31, it is likely that employee 30 either witnessed or carried out the unauthorized computer access. It is unlikely that employee 30 was absent at the time that his officemate’s computer was compromised. The IP logs show network activity on 30’s computer during all three instances of unauthorized access on his officemate’s computer, and the network activity did not appear out of line with 30’s normal network activity. We found no other instance of both computers in an office being compromised simultaneously. Finally, only employees 30 and 27 had access to Embassy computers during all 18 unauthorized data transmissions. According to the prox card data, the other 58 employees were in the Embassy’s classified space during at least one of the unauthorized transmissions.

Employee 30 attempted to avoid detection by using other employees’ computers to transmit the data and by doing so when the respective offices were empty. By comparing the times of the unauthorized data transmission with the IP log and prox card data of each affected employee, we verified that all of the authorized computer users and their respective officemates were (a) in the Embassy classified space, (b) confirmed to be or probably outside of the Embassy, or (c) away from their desks when the unauthorized access occurred. In the later cases, we have no direct evidence that the individuals were away from their desks. Nevertheless, there are extended gaps in IP logs surrounding each instance of an unauthorized data transmission. These gaps could be meetings or other activities that occurred away from the individuals’ desks.

Criminal Organization Utilized Social Network to Coordinate Activities

We have analyzed Flitter account data to identify a social network that matches previous intelligence reporting on the criminal organization. This social network is structured in a way to provide several degrees of separation between the Embassy employee providing the information and the criminal organization’s leader. The Embassy employee has three handlers in the criminal organization. There is no direct communication among the three handlers. Rather, they all associate with a common middleman who in turn associates with the organization’s leader.

Our investigation has also revealed the cities where the members and leader of the criminal group reside. All of the handlers—Flitter names Pettersson, Kushnir, and Reitenspies—all reside in Prounov, Flovania, along with the Embassy employee (Flitter name Schaffter). This would enable the handlers to easily manage their relationship with the Embassy employee. The middleman, Good, lives in Kannvic; and the organizations leader, Szemeredi, lives in Kouvnic. These geographical separations are probably used as a security measure to decrease the risk of detection or exposure of the network.

Based on previous intelligence reporting, we expected the leader to live in a larger city. Nevertheless, his presence in Kouvnic, which is close to the international border with Trium, may indicate ties to that country. Being close to the border could facilitate transport of illicit material as well as provide means for a quick escape if the network was exposed.

Possible Surveillance Footage of Exchange

We have acquired surveillance video from a location within walking distance of the Embassy that probably captured a meeting between the Embassy employee and a handler. A meeting between two handlers in that location is unlikely. Since the organization’s operational security practices ban Flitter communication between handlers, a meeting between them so close to the Embassy is unlikely.

The meeting involved an exchange of briefcases, which most likely contained payment to the Embassy employee for the exfiltrated data. In the surveillance video we observe two individuals meeting for approximately five minutes with two briefcases on the ground between them. At the conclusion of the meeting, one individual takes the other individual’s briefcase and they both leave. Since the Embassy data is being transmitted to the criminal organization electronically, there would be few reasons to meet other than to provide payment for the data.

Part of our video analysis involved attempting to correlate employee 30’s known movements with the video footage. We saw no suspicious incidents on the 24 January tape during times were 30 could have left the Embassy. Since 26 January was a Saturday and employee 30 did not enter the Embassy that day, we had no times to correlate. This of course does not rule out employee 30 as one of the possible persons of interest in the video. Additionally, if employee 30 were receiving payment, he probably would not want to bring it back to the Embassy.

Recommendations for Further Investigation

We have a high degree of confidence that employee 30 is our suspect. Nevertheless, this suspicious is based only on electronic evidence which we have assumed to be accurate. Additional electronic evidence, such as direct connections between employee 30 and the “Shaffter” Flitter account, would strengthen our case. Interviews with other Embassy employees might reveal possible sightings of employee 30’s unauthorized computer use—several unauthorized data transmissions occurred in relatively small or closing windows of opportunity based on IP log and prox data. We should also verify whether employee 30’s appearance matches either of the individuals in the surveillance footage.

Finally, we believe that the data exfiltration occurred over an online social networking service. We believe port 8080 is used by these social networking sites, which the Embassy employees are known to use. Of all the data transmission on port 8080, payload request size on transmissions to the suspect IP address were the largest. Verification that port 8080 is used for social networking services, as well as investigation into the suspect IP address, 100.59.151.133, could validate this hypothesis.

GC.2: Who are the major players in the scenario and what are their relationships?

Summary

Based on analysis of the MC1 and MC2 data in the Palantir platform, we believe embassy employee 30 most likely is the malicious insider who transmitted embassy data to the outside criminal organization using the Flitter social networking website. We identified 18 probable instances of 30 using 12 different embassy computers to make unauthorized Data Transmissions to IP address 100.59.151.133 which we believe is the Flitter IP address. These data transmissions, all over port 8080, involved very large payload requests, and occurred one to three times per day every Tuesday and Thursday over a four week period in vacant offices.

Data Preparation and Import into Palantir

Prior to analyzing the MC1/MC2 datasets in Palantir, we first prepared the platform and data for import. To begin, we used Palantir’s Dynamic Ontology Manager to build an ontology to accurately model the MC1/MC2 data. For instance, we created proximity badge events and added a number of properties to Data Transmission event objects, such as packet size. Next we imported the classified space prox-in/prox-out timestamps paired together so we could import “Prox-event Classified” events with a duration of the entire time the employee was in the classified area.

In Palantir, the data import process is quite simple. The user adds a file or database to the import wizard, which allows the user to map columns in the data file to properties based on the chosen object type. The import wizard then automatically imports the data, creating objects with their respective properties, linking objects as specified, and resolving duplicate objects based on customizable resolution rules.

Screenshot 1

Investigational Hypotheses

One of Palantir’s strengths as an analytical platform is the ability to integrate different types of data into a single investigational environment. Simply searching for suspicious network traffic can be an impossible task, but by correlating proximity badge events with data transmissions we could easily identify unauthorized data transmissions and continue investigating from there. We developed a set of logical questions to ask about the data and designed workflows to answer those questions. We first focused our investigation by looking for “piggybacking” events involving the classified space. With all 4080 “Prox-event Classified” events on the graph, we identified five open-ended piggybacking events using Palantir’s timeline. Selecting these open-ended events on the timeline, we pulled these events from the group, and linked them to the associated employees: 30, 38, and 49.

Screenshot 2

Employee 30 had suspicious pattern of multiple piggybacking events where he did not prox-in to the classified space but did prox-out on January 10, 17, and 24. Based on this suspicion, we investigated his officemate's computer for unauthorized Data Transmissions. Viewing all of the officemate’s Data Transmission and prox events in the timeline, we indeed found a Data Transmission that occurred while the employee was in the classified space. Highlighting that event in the Timeline, we can find it on the Graph and see in the Selection Helper that the Data Transmission is to IP address 100.59.151.133.

We then used Palantir’s Search Around application, which searches for specified target objects by property or link type based on an initial object selection, to search for other embassy computers that connected with the suspicious IP address. We found 18 Data Transmissions from 12 different embassy computers.

Badge and Network Traffic Investigational Workflow

Working from these 18 Data Transmissions, we devised workflows to answer the following questions:

  1. Was the authorized computer user present during the Data Transmission?
  2. Was the officemate present during the Data Transmission?
  3. What employee had access to a computer during all 18 Data Transmissions?
The workflow involved bringing all 18 Data Transmissions to the graph, along with the associated computers, the authorized users, and the users’ officemates. From there we performed a Search Around to individually bring each employee’s associated events to the graph. We then used the Timeline, which can differentiate objects by color, to visually identify Data Transmissions that were probably unauthorized based on the computer owner being (1) in the classified space, (2) out of the building, or (3) possibly away from the computer based on the temporal pattern of other events.

Screenshot 3

Finally, we created temporal filters for all 18 Data Transmissions and searched for all overlapping Prox-event Classified events, which we then linked to the associated employees in order to exclude them as possible suspects. Based on this search, we were left with 30 as our prime suspect. While verifying these suspicious transmissions, we were also able to see that there were no other unauthorized transmissions on these individuals’ computers.

Patterns of Suspicious Usage

Of the 18 Data Transmissions to the suspicious IP address, 8 occurred while the assigned user was in or just exiting the classified space, 6 occurred while the user probably was out of the office, and 4 occurred during a period of no other network activity on the computer, which could plausibly indicate the user was, say, away in an unclassified meeting. Only employee 30 probably was present during an officemate’s suspicious Data Transmission and so either saw or is the malicious insider. It is unlikely that employee 30 was absent at the time his officemate’s computer was compromised. The IP logs show network activity on 30’s computer during all three instances of unauthorized access on his officemate’s computer, and the network activity did not appear out of line with 30’s normal network activity. In six other cases of suspicious traffic the officemate was in the classified space during the transmission, and in the remainder of cases the officemate probably was outside the building or otherwise away from the desk.

Screenshot 4

Viewing the 18 transmissions in the Time Wheel, we can see that they occur on one to three times per day on Tuesdays and Thursdays over a four week period. We also see that the 18 transmissions that used port 8080 were among the largest payload requests during the month and that the 18 transmissions had the largest payload requests of all port 8080 network traffic. Although 30 had some Data Transmissions in close proximity to the times of some unauthorized transmissions at other computers, we feel that the evidence, with a high degree of confidence, points to employee 30 as the malicious insider.

Screenshot 5

Data Exfiltration through Social Networking

We suspect that port 8080 is used for Flitter network traffic and that IP address 100.59.151.133 is the Flitter network the malicious insider—probably employee 30—is using this to exfiltrate data from the embassy to his external nefarious network. We found three networks which possibly match the two scenarios provided. The scenario submitted matches structure A, with Flitter account Schaffter as the employee in the center of the network. We have submitted two other networks which match scenario B as well. These networks center on employees Bailey and Terekhov.

The differences between the network discovered and the network as described in the challenge are few. In the class A network we found, the only significant difference was in the geospatial information (see below). The employee had 40 flitter contacts (exact match), the handlers each had 30-40 contacts (exact match), they did not contact one another (exact match). The middle man was connected to each handler and only two other people (exact match). The leader had well over 100 contacts, some of which were international (exact match).

Like MC1, the data for MC2 first had to be imported into the Palantir Revisioning Database. This process is relatively simple, only requiring two imports using the dedicated Palantir Job server. First, the Flitter accounts were imported. When finished, 6000 objects are created, labeled with the Flitter name, and having the Flitter ID number and city/country as properties. Next, the links are imported in a similar manner. In this case, we resolve the links to the Flitter accounts using a Palantir Entity Resolution Suite. Finally, the map of Flovania can be easily added to the Palantir map application as a tile map overlay. We mapped the cities on the Flovanian map to the real-world geo-coordinates to enable geo-analysis in the Palantir Map.

Screenshot 6

The bulk of the social network analysis was performed in the Graph and Histogram, and networks were built with the Search Around tool which allows the analyst to make queries such as, “Show me all individuals connected through flitter that have X number of flitter contacts themselves”.

Social Network Investigational Workflow

The basic work flow was as follows:

  1. Using a Palantir filter, search for all Flitter accounts that have 38-42 Flitter contacts that are located in Flovania (23 Flitter contacts returned).

    Screenshot 7

  2. Using Search Around, determine which of these 23 have 3 or more contacts that have 30-40 contacts themselves (possible handlers). This narrowed down our 23 Flitter contacts to 5 suspects: Bailey, Campr, Lafouge, Schaffter, and Terekhov.

    Screenshot 8

  3. Starting with each of the 5 suspects, perform a Search Around to bring back all the potential handlers.
  4. Next, perform a Search Around on the handlers to bring back the potential middlemen. We searched for all contacts of the handlers that have less than 6 contacts (as we are informed that the middlemen do not have that many flitter contacts). By placing the handlers in the corners of the graph, if there is one centralized middleman (Structure A), they will appear in the middle of the graph workspace. If is in a Structure B network you will simply see a circle of new contacts around the handlers.

    Screenshot 9

  5. Next, perform search around on all the potential middle men by looking for contacts they are connected to with greater than 100 contacts. If it is structure A, you will see each handler communicating with only one other contact, who then communicates with a secondary contact (the fearless leader). If it is structure B, you will see three contacts in the circles around the handlers all connecting to only one matching contact (the fearless leader).
  6. Finally, to make sure that the leader is not connected to the handlers or employee, perform one final search around to bring back all of the leaders contacts. If you see a connection to a handler or the employee, then this is NOT the correct network.

    Screenshot 10

Social Network and Surveillance Video Analysis

We found three possible networks: one with structure A, and two with structure B. We submitted the structure A network as this matched most closely with the intelligence about the network. If this is the case, then employee 30 and Flitter account Schaffter probably are the same individual.

Palantir is a powerful platform for geospatial analysis. Since we had geo-coded all Flitter accounts based on the city location before import, we simply dragged the selected Flitter objects from the Graph workspace to the Map workspace. Our geospatial analysis indicated that the employee and his handlers all lived in Prounov. This implies that the handlers probably were in direct contact with the employee for handoffs, financial exchanges, etc.

This information allows us to assume that in the video surveillance it is one of the handlers who makes an exchange with the malicious Embassy employee. This exchange occurred on 26 January from approximately 11:24 AM to 11:30 AM. In this video footage we observed a man waiting with a briefcase (location 2), a woman with another briefcase begins interacting with the man, and the man takes the woman’s briefcase when they separate.

In examining the surveillance video, we imported metadata for 10-minute segments of footage with links to launch them in a media player. Since these “Sighting” events have timestamps, they are searchable using temporal filters in the Timeline. We searched times on 24 January around gaps in employee 30s prox data and IP logs that were long enough for him to leave the Embassy, since the video camera is in walking distance of the Embassy. Review of the 24 January had no suspicious incidents.

Employee 30 did not enter the Embassy on 26 January, so we had no times to correlate for our filtered searches. Nevertheless, we discovered the suspicious incident noted above. In Palantir, however, we can create searchable events and link them to the surveillance footage and any individual that appears in them. We created a “Meeting” event linked to the footage. Upon verifying that employee 30’s appearance matches one of the individuals in the meeting, we could then link this meeting to employee 30. Furthermore, an analyst making this connection could publish it from his own investigational sandbox to the enterprise database.

Returning to the geospatial analysis, we further discovered that middleman lives in Kannvic and the leader lives in a border city, Kouvnic. The leader’s location is interesting as a border city would allow him to exfiltrate data across the border into the neighboring country of Trium. In general, geographic analysis allowed us to provide greater context to our overall analysis.

The overall structure indicates that embassy employee 30 was using Flitter social networking under account Schaffter to exfiltrate data outside the embassy network. The handlers would pass this information to the middleman, middleman to the leader. As the leader is located in a border city, they can possibly facilitate the transport of classified information across international borders.

Screenshot 11